CMB ANALYSIS 



J. RICHARD BOND 

Canadian Institute for Theoretical Astrophysics 
Toronto, ON M5S 3H8, CANADA 

AND 

ROBERT G. CRITTENDEN 

DAMTP, Centre for Mathematical Sciences, 

University of Cambridge, Cambridge CBS OWA, United Kingdom 

Abstract 

We describe the subject of Cosmic Microwave Background (CMB) analysis 
— its past, present and future. The theory of Gaussian primary anisotropies, 
those arising from linear physics operating in the early Universe, is in rea- 
sonably good shape so the focus has shifted to the statistical pipeline which 
confronts the data with the theory: mapping, filtering, comparing, cleaning, 
compressing, forecasting, estimating. There have been many algorithmic ad- 
vances in the analysis pipeline in recent years, but still more are needed for 
the forecasts of high precision cosmic parameter estimation to be realized. 
For secondary anisotropies, those arising once nonlinearity develops, the 
computational state of the art currently needs effort in all the areas: the 
Sunyaev-Zeldovich effect, inhomogeneous reionization, gravitational lens- 
ing, the Rees-Sciama effect, dusty galaxies. We use the Sunyaev-Zeldovich 
example to illustrate the issues. The direct interface with observations for 
these non-Gaussian signals is much more complex than for Gaussian pri- 
mary anisotropies, and even more so for the statistically inhomogeneous 
Galactic foregrounds. Because all the signals are superimposed, the sep- 
aration of components inevitably complicates primary CMB analyses as 
well. 

1. Introduction 

1.1 What is CMB Analysis? The subject we call "CMB analysis" is a 
blend of basic theory, simulation and statistical data analysis. The goal 
of CMB analyzers is to extract the physics of the various signals that 
contribute to the data. CMB analyzers may therefore be theorists or ex- 
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perimentalists. Depending upon how advanced the state-of-the-art is, the 
relevant analysis topic may lean more towards the theory/simulation side 
or more towards the statistical analysis and Monte Carlo simulation side. 
In the CMB field, we adopt a theorist's distinction between primary, sec- 
ondary and foreground anisotropics, though of course the observations do 
not know of this distinction. The primary ones are those we can calculate 
using linear perturbation theory (or, in the case of cosmic defects, with 
linear response theory). This covers the crucially important epoch of pho- 
ton decoupling near redshift 1100, and even until the current time on very 
large scales, and to redshifts of a few on intermediate scales. Secondary 
anisotropies are those associated with nonlinear phenomena, either calcu- 
lable via weakly nonlinear perturbation theory, semi-analytic methods or by 
more direct simulation of nonlinear patterns. Gravitational lensing effects 
on the CMB, quadratic nonlinearities and the kinetic Sunyaev-Zeldovich 
effect associated with Thomson scattering from flowing matter, reioniza- 
tion inhomogeneities, the thermal Sunyaev-Zeldovich effect associated with 
Compton upscattering from hot gas, the Rees-Sciama effect associated with 
nonlinear potential wells, all come under this heading. We also traditionally 
call emission by dust in high redshift galaxies a secondary process, and even 
emission from extragalactic radio sources. On top of this, there are various 
foreground emissions from dust and gas in our Milky Way galaxy — signals 
which are nuisances to the primary CMBologist but of passionate interest 
to interstellar medium astronomers. Fortunately, most of the secondary and 
foreground signals have very different dependencies on frequency (Fig. 1), 
and rather statistically distinct sky patterns (Fig. 2). Collateral informa- 
tion from non-CMB observations can also be brought to bear to unravel 
the various components. 

1.2 Primary Theory and Current Data: We think we know how to 
calculate the primary signal in exquisite detail. The fluctuations are so 
small at the epoch of photon decoupling that linear perturbation theory 
is a superb approximation to the exact non-linear evolution equations. In- 
tense theoretical work over three decades has put accurate calculations of 
this linear cosmological radiative transfer problem on a firm footing, and 
there are speedy, publicly available and widely used codes for evaluation 
of anisotropies in a wide variety of cosmological scenarios, e.g. "CMBfast" 
[1]. These have been further modified by many different groups to attack 
even more structure formation models. 

The simplest and thus least baroque versions of inflation theory predict 
that the fluctuations from the quantum noise that give rise to structure 
form a Gaussian random field. Linearity implies that this translates into 
temperature anisotropy patterns that are drawn from a Gaussian random 
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Figure 1. Foregrounds and secondary anisotropics depend differently on pho- 
ton frequency, the key property for separating the components. Plotted is the 
frequency dependence of the effective thermodynamic temperature fluctuations, 
normalized to their values at 75 GHz. Thus the primary CMB fluctuations corre- 
spond to a horizontal line in this figure, synchrotron and bremsstrahlung dominate 
at low frequencies, dust at high. The bands represent a measure of our uncertainty 
in the appropriate foreground spectra. The highly distinctive shape for the Sun- 
yaev-Zeldovich (SZ) effect is negative at low frequencies, positive at high. The 
actual level of contamination depends on the angular scale. The foreground emis- 
sion is minimal around 90 GHz, not far from where the CMB intensity peaks. 
Detector frequencies for some notable experiments are denoted by the symbols at 
the top. Currently, detectors below 100 GHz are HEMTs, above are bolometers. 
Note the wide coverage planned for the Planck satellite which uses both. 



process and which can be characterized solely by their power spectrum. 
The emphasis is therefore on confronting the theory with the data in power 
spectrum space, as in Fig. 3. The panels show how the power spectrum Cg 
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Figure 2. A schematic view by Bouchet and Gispert of many of the possible mi- 
crowave foregrounds that need to be separated from the primary CMB anisotropy 
pattern shown in the 10° map at the bottom. These include noise effects (striping), 
galactic foregrounds (dust, synchrotron and bremsstrahlung or free- free emission) 
as well as secondary anisotropies (extragalactic radio and infrared galaxies, CMB 
upscattering by hot gas in clusters - "clusters Y-SX," and the Doppler effect aris- 
ing from moving clusters - the kinetic SZ effect or "clusters AT/T"). Each of these 
has a unique temperature pattern on the sky, and all but the kinetic SZ effect have 
a different spectral signature (Fig. 1). 
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responds as various cosmological parameters are individually changed. 

Until a few years ago, the emphasis was on fairly restricted parameter 
spaces, but now we consider spaces of much larger dimension. These are 
used for forecasts of how well proposed experiments can do, and for the 
first round of data from a sequence of high-precision experiments covering 
large areas of sky: ground-based single dishes and interferometers; balloons 
of short and long duration (LDBs, ~ 10 days) and eventually of ultra-long 
duration (ULDBs, ~ 100 days); NASA's MAP satellite in 2001 and ESA's 
Planck Surveyor in 2007. The lower right panel of Fig. 3 gives a forecast of 
how well we think that the two satellite experiments can do in determining 

if everything goes right. This rosy prognosis should be compared with 
the compressed bandpowers shown in the other panels estimated from all 
of the current data: DMR and some 19 other pre-conference "prior" ex- 
periments, TOCO [2], unveiled about the time of the conference, and four 
that appeared after the conference, the Boomerang flights (the LDB flight 
[3, 4] and its North America test flight [5]) the Maxima-I [6, 7] flight, the 
first results from CBI, the Cosmic Background Imager [8], and from DASI, 
the Degree Angular Scale Interferometer [9]. Although new methods have 
been applied to these data sets, much was in place and some algorithms 
were well tested on previous data, in particular DMR, SK95, MSAM and 
QMAP. 

Although we sketch the current state of the art in CMB analysis in 
this paper, note that this is a very fast moving subject. A large number of 
researchers in a handful of groups are working on a broad range of issues, of- 
ten as members of the experimental teams associated with the increasingly 
complex experiments. A snapshot of the state two years ago and references 
to the relevant papers was given in [33] , but many advancements have been 
made since then. For example, the MAP plan for analysis is described in 
[40]. One issue that still needs to be addressed is that the strong simplify- 
ing assumptions of signal and noise Gaussianity cannot be correct in detail, 
and could yield misleading results. 

1.3 Primary Parameter Sets for Gaussian Theories: A "minimal" 
inflation-motivated parameter set involves about a dozen parameters, e.g., 
Wb,^cdm, v hdm ,uj er , 0, k ,n A , tc, <T8,n s , f t ,n t }. For many species we use 
ujj = iljh 2 rather than Qj since it directly gives the physical density of 
the particles rather than as a ratio to the critical density: thus, wj, for 
baryons, uJ c dm f° r cold dark matter, uJhdm f° r hot dark matter (massive 
but light neutrinos), and uj er for relativistic particles present at decoupling 
(photons, very light neutrinos, and possibly weakly interacting products 
of late time particle decays). The total non-relativistic matter density is 

<^m = ^hdm + ^cdm + &b- 
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The curvature energy relative to the closure density is Ofc = 1 — Qtot and 
r^A is the energy relative to closure in the cosmological constant. Oa can also 
be interpreted as a vacuum energy, a sector which may be more complex 
and require more parameters. For example, a scalar field which dominates 
at late times, "quintessence", has potentials and other interactions which 
must be set. A simple phenomenology of quintessence has added to an 
average equation of state pressure-to-density ratio wq = Pq/Pq- 

Another factor crucial for predicting the CMB anisotropies is Tc, the 
Compton depth to Thompson scattering from now back through the period 
when the universe was reionized by early starlight. 

To characterize the initial fluctuations, amplitudes and tilts are required 
for the various fluctuation modes present. For adiabatic scalar perturba- 
tions, the tilt is n s and the amplitude is most appropriately the overall 
power in curvature fluctuations, but is often replaced by erf, a density 
bandpower on the scale of clusters of galaxies, 8h _1 Mpc. For tensor modes 
induced by gravity waves, the amplitude parameter is some measure of the 
ratio of gravity wave to scalar curvature power, ft; usually a ratio of Q's 
is used, e.g., at 1=2 or 10. Inflation theories often give relationships be- 
tween the tensor tilt nt and ft which can be used to reduce the parameter 
space e.g., [10]. Parameters for the amplitude and tilt of scalar isocurvature 
modes could also be added to the mix. The characterization of the modes 
present in the early universe by amplitude and tilt could be expanded to 
include parameters describing the change of the tilts with scale (dn s /dln k), 
the change of that change, and so on. Relatively full functional freedom in 
n s (k) is possible in inflation models, and substantial freedom also exists for 
n t {k). 

1.4 Well and Poorly Determined Parameter Combinations: In 

Fig. 3, we compare the compressed bandpowers with Ci sequences as Ub, 
Q-toti ^hdm an d n s are individually varied. These show the discrimina- 
tory power of the current data. Fixing the age at 13 Gyr as we do here 
defines a specific relation between u>k, wa and u m . From these figures, we 
expect that four can be determined reasonably well (LOb,Qtot, n s and the 
overall amplitude). This is borne out in the full study with the real data, 
modulo parameter correlations which mean the best determined quantities 
are linear combinations of parameters, or parameter eigenmodes [12, 13, 10]. 
For example, it is actually a combination of Sltot and which is well de- 
termined, but it happens to be Sltot -dominated. Another orthogonal combi- 
nation of the two will remain very poorly determined no matter how good 
the CMB data gets [12], an example of a near-degeneracy among cosmo- 
logical parameters. Other near-degeneracy examples are less severe: e.g., 
Qhdm/(Qcdm + ^hdm) is not well determined by current data, but could be 
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Figure 3. The Ce anisotropy bandpower data, including the recent balloon-borne 
and interferometry data, compressed to 13 bands using the methods of [14] are 
compared with various Ci model sequences, each for universes with age 13 Gyr 
(left to right): (1) untilted flat ACDM sequence with Hq varying from 50 to 90, 
horn to 0.87; (2) n s varying from 0.85 to 1.3 for an H = 70 (fl A =.66) 
ACDM model - dotted is 0.85 with gravity waves, next without, upper dashed 
is 1.25, showing visually why n s is found to be nearly unity; (3) f^sh 2 varying 
from 0.0063 to 0.10 for the H Q =7Q ft A =.66 ACDM model; (4) neutrino fractions 
Qhdm/Qm varying from 0.1 to 0.95 for an Hq=50 f2 A =0 sequence with Q m =l; (5) 
Hq from 50 to 65, ilk from to 0.84 for the untilted oCDM sequence, showing 
the strong £-shift of the acoustic peaks with fi^; (6) Slsh 2 varying from 0.003 to 
0.05 for the Hq = 50 f2 A =0 sCDM model; (7) an isocurvature CDM sequence with 
positive isocurvature tilts ranging from to 0.8; (9) sample defect C^s for textures, 
etc. from [15] - cosmic string Cgs from [16] are similar and also do not fare well 
compared with the current data. The bottom right panel is extended to low values 
to show the magnitude of secondary fluctuations from the thermal SZ effect for 
the ACDM model. The kinematic SZ Ci is significantly lower. Dusty emission from 
early galaxies may lead to high signals, but the power is concentrated at higher 
£, with a weak tail because galaxies are correlated extending into the £ < 2000 
regime. Forecasts of how accurate Ci will be determined for an sCDM model from 
MAP (error bars growing above £ ~ 700) and Planck (small errors down the Ci 
damping tail) are also shown. 
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by Planck. 

Accompanying the idealized bandpower forecasts are predictions for how 
well the cosmological parameters in inflation models can be determined af- 
ter integrating, i.e., marginalizing, the likelihood functions over all other 
parameters. In one exercise that allowed a mix of nine cosmological pa- 
rameters to characterize the space of inflation-based theories (the 1 1 above 
with uj er fixed and nt slaved to ft), COBE was shown to determine one 
combination of them to better than 10% accuracy, LDBs and MAP could 
determine six, and Planck seven [13]. All currently planned LDBs could also 
get two combinations to 1% accuracy, MAP could get three, and Planck 
five. With the current data, one can forecast four to 10% accuracy, which 
is actually borne out in detailed computations with the data. 

Two panels in Fig. 3 show what happens when two classes of pure isocur- 
vature models are considered, one a Gaussian isocurvature CDM model 
with the spectral index allowed to tilt arbitrarily, another a set of non- 
Gaussian cosmic defect models. Neither fare well with the current data. 
Primary anisotropies in defect theories are more complicated to calculate 
because non-Gaussian patterns are created in the phase transitions which 
evolve in complex ways and which require large scale simulations. The non- 
Gaussianity means that the comparison with the data should be done more 
carefully, without the Gaussian assumption that goes into the bandpower 
estimations. 

1.5 Current CMB and CMB+LSS Constraints: Although the pur- 
pose of this paper is to describe CMB analysis techniques, the comparison 
of the radically-compressed current bandpowers with the models in Fig. 3 
invite a brief description of the results. Explicit numbers are those derived 
in [4] for the "minimal" inflation-motivated 7-parameter set for the combi- 
nation of DMR and Boomerang, with a weak prior probability assumption 
on the Hubble parameter (0.45 < h < 0.90) and age of the Universe (> 10 
Gyr.) Including all other current CMB data as well gives very similar esti- 
mates and errors, reflecting the consistency between the older, less statisti- 
cally significant data and the new experiments. (DMR+DASI numbers [9] 
are quite close to DMR+Boomerang numbers, since the spectra are similar. 
For a more complete description, see [32, 56, 65, 4, 9, 7].) 

1.5.1 CMB-only Estimates: As the upper right panel of Fig. 3 suggests, 
the primordial spectral tilt n s is well determined using CMB data alone: n s 
= 0.97±;og. This is rather encouraging for the nearly scale invariant models 
preferred by inflation theory. The figure also shows that constraints on Q,^ 
will not be very good, but the strong dependence of the position of the 
acoustic peaks on Vtk means that it is better restricted: Qtot ~ 1-02 ± 0.06. 
The baryon abundance is also well determined, = 0.022 ± 0.004, near 
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the 0.019±0.002 estimate from deuterium observations in quasar absorption 
line spectra combined with Big Bang Nucleosynthesis theory. 

The isocurvature CDM models with tilt rii s > and the isocurvature 
defect models (e.g. strings and textures) [15, 16] shown in Fig. 3 clearly 
have more difficulty with the CMB bandpowers. 

There is a region of the spectrum that nominally seems to be out of 
reach of CMB analysis, but is actually partly accessible because ultralong 
waves contribute gentle gradients to our CMB observables. For example, 
the size of compact spatial manifolds has been constrained; [66] find for flat 
equal-sided 3-tori, the inscribed radius must exceed l.lxiss from DMR at 
the 95% confidence limit, where xiss ls the distance to the last scattering 
surface. For asymmetric 1-tori, the constraint is weaker, > 0.7xz ss , but 
still reasonably restrictive. It is also not as strong if the platonic-solid- 
like manifolds of compact hyperbolic topologies are considered, though the 
overall size of the manifold should as well be of order the last scattering 
surface radius to avoid conflict with the large scale power seen in the COBE 
maps [66]. 

1.5.2 CMB+LSS Estimates: We have always combined CMB and LSS 
data in our quest for viable models. For example, DMR normalization de- 
termines ag to within 7%, and comparing with the a$ ~ O.SSSl^ ' 56 tar- 
get value derived from cluster abundance observations severely constrains 
the cosmological parameters defining the models. This is further restricted 
when the possible variations in the spectral tilt allowed by the COBE data 
are constrained by the higher £ Boomerang data. More constrictions arise 
from galaxy- galaxy and cluster-cluster clustering observations: the shape 
of the linear density power spectrum must match the shape reconstructed 
from the data. In [32, 56, 65, 4], the LSS data were characterized by a 
simple shape parameter and a$, with distributions broad enough so that 
these prior probability choices would not be controversial, but encompass 
the ranges that almost all cosmologists would believe. 

Using all of the current CMB data and the LSS priors, [4] got Hq = 
56±9, Qa = 0.55±0.09, with n s , Qtot and f^/i 2 virtually unchanged from the 
CMB-only result. Apart from the significant detection, the dark matter 
is also strongly constrained, VL c d m h 2 = 0.13l;g|. Restricting VL to t to be unity 
as in the usual inflation models, with CMB+LSS, n b h 2 = 0.021 ± 0.003, 
tt cdm h 2 = 0.13 ± 0.01, ft A = 0.62 ± 0.07 and n s = 0.98±$. If the equation 
of state of the dark energy is allowed to vary, wq < —0.3 is obtained, be- 
coming substantially more restrictive when supernova information is folded 
in, wq < -0.7 at 2 a [65]. 

Fig. 4 gives a visual perspective from [65] on how the parameter estima- 
tions for the adiabatic models evolved as more CMB data were added (but 
with always the LSS prior included for this figure). With just the COBE- 
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Figure 4- This figure from [65] shows 2-a likelihood contours for the dark matter 
density cu c = fi CI j m h 2 and {Qk, Qa, n 3 , tut} for the LSS prior combined with weak lim- 
its on Ho (45-90) and cosmological age (> 10 Gyr), and the following CMB experi- 
mental combinations: DMR (short-dash); the "April 99" data (short-dash long-dash); 
TOCO+4.99 data (dot short-dash); Boomerang-NA+TOCO+4.99 data (dot long-dash, 
termed "prior-CMB"); Boomerang-LDB + Maxima-1 + Boomerang-NA + TOCO + 
4.99 data (heavy solid, all-CMB). These 2a lines tend to go from outside to inside as 
more CMB experiments are added. The smallest 2-a region (dotted and interior) shows 
SNl+LSS+all-CMB, when SNI data is added. For the SIa, n s and uib plots, f2 tot =l has 
been assumed, but the values do not change that much if Qtot floats. The main movement 
with the most recent data [4] is that u) c localizes more around 0.13 in all panels, and the 
u>b contour in the lower right panel migrates downward a bit to be in better agreement 
with the Big Bang nucleosynthesis result. 
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DMR+LSS data, the 2-cr contours were already localized in Q, c d m h 2 . With- 
out LSS, it took the addition of Maxima-1 before it began to localize. £lk 
localized near zero when TOCO was added to the April 99 data, more so 
when Boomerang-NA was added, and much more so when Boomerang-LDB 
and Maxima-1 were added. Some n s localization occurred with just "prior- 
CMB" data, l^/i 2 really focussed in with Boomerang-LDB and Maxima-1, 
as did f^A- If only DMR plus the most recent Boomerang results [4] to 
£ = 600 are used (the limit in [3]), the plot is rather similar. However, when 
all recent Boomerang [4] results to £ = 1000 are used, the inner contour 
in the Qdmh 2 direction sharpens up in all panels, and the f^/i 2 contour 
lowers to be in the good agreement with the big bang nucleosynthesis result 
indicated above. 

2. Computing Non-Gaussian Secondary Signals: the SZ Example 

The secondary fluctuations involve nonlinear processes, and the full panoply 
of A-body and gas-dynamical cosmological simulation techniques are being 
used to study them. It is realistic to hope that the thermal and kinetic 
Sunyaev-Zeldovich effects can be understood statistically this way with 
sufficient accuracy for CMB analysis, and this is the main example used in 
this section. On the other hand, star-bursting galaxies will be quite difficult 
to understand from simulations alone, but CMB analysis will be greatly 
aided by their point-source character for most experimental resolutions. 
Galactic foregrounds, however, cannot be "solved" by hydro calculations. 

2.1 Hydrodynamical Calculations: The Santa Barbara test [17] com- 
pared simulations of an individual cluster of 10 15 M@ done with a variety of 
hydrodynamical and N-body techniques, in particular grid-based Eulerian 
methods and both grid-based and smooth particle (SPH) Lagrangian meth- 
ods. (Eulerian grids are fixed in comoving space, Lagrangian grids adapt 
to the density of the flow.) Fig 5 shows this cluster, computed with the 
treePM-SPH code of Wadsley and Bond [19, 20], seen at the present in 
weak lensing, SZ and X-rays, at contour thresholds designed to represent 
potentially observable levels. For a review of the great strides made in SZ 
experimental methods and results on individual clusters up to 1998 see [18]. 

Another hydrodynamical example, the simulation of a supercluster [20, 
21] using the treePM-SPH code, is used to illustrate the computational 
challenges once we extend beyond individual cluster simulations. Fig. 7 
shows thermal SZ maps of the supercluster at redshift z = 0.5. The com- 
putation evolved from redshift z = 30 to the present a high resolution 104 
Mpc diameter patch with 100 3 /2 gas and 100 3 /2 dark matter particles, 
surrounded by gas and dark particles with 8 times the mass to 166 Mpc, in 
turn surrounded by "tidal" particles with 64 times the mass to 266 Mpc. 
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There were a total of 1.6 million particles including the medium and low 
resolution regions, but much larger simulations are possible in this era of 
massive parallelization. For example, a parallel tree-SPH code ("gasoline"), 
with timesteps that can vary from particle to particle, can routinely do 
256 3 gas and 256 3 dark matter simulations, and even 512 3 simulations on 
relatively modest (< 100 processor) BEOWULF systems [31]. 

The treePM technique is a fast, flexible method for solving gravity and 
can accurately treat free boundary conditions. The SPH part of the code 
includes photoionization as well as shock heating and cooling with abun- 
dances in chemical (but not thermal) equilibrium, incorporating all radia- 
tive and collisional processes. The species considered were H, H + , He, He + , 
He ++ and e~. The extra cooling associated with heavy elements injected 
into the intracluster media during galactic evolution was ignored, but so 
was the feedback on the medium from energy injected from galactic su- 
pernovae: the computation does not have high enough spatial resolution to 
calculate galaxy collapse. 

We now describe some aspects of simulation design for the supercluster 
simulation [20, 21] which are also relevant for the larger parallel calcula- 
tions. One first decides on the mass resolution to achieve, which is set by 
the lattice spacing of particles on an initial high resolution grid, cll, cho- 
sen to be ~ 1 Mpc comoving to ensure that there will be adequate waves 
to form the target objects, in this case clusters. 1 Next one needs to de- 
termine the spatial resolution of the gas and of the gravitational forces, 
preferably highly linked. In Lagrangian codes like treePM-SPH, this varies 
considerably, being very high in cluster cores, moderate in filaments, and 
not that good in voids. Since accurate calculations of the cluster cores were 
a target, resolution better than ~ 40 kpc was desired. The best resolution 
obtained was about 15 kpc. Given a^, the size of the high resolution part 
of the simulation volume is determined by CPU limitations on the number 
of particles that can be run in the desired time. This means the high reso- 
lution volume may distort considerably during the simulation. To combat 
this, progressively lower resolution layers were added to ensure accurate 
large scale tides/shearing fields operated on the high resolution patch. For 

x The mass resolution limits the high k power of the waves that can be laid down in 
the initial conditions (Nyquist frequency, ir/ai,), but for aperiodic patches there is no 
constraint at the low k end: the FFT was used for high k, but a power law sampling 
for medium k, and a log k sampling for low k, the latter two done using a slow Fourier 
transform, i.e., a direct sum over optimally-sampled k values, with the shift from one 
type of sampling to another determined by which gives the minimum volume per mode 
in fe-space. By contrast, more standard periodic "big box" calculations are limited by 
the fundamental mode, although one can alleviate this by nesting the high resolution 
box in lower resolution ones as in the supercluster simulation. Even so, this simulated 
supercluster could not be done in a periodic code with this number of particles because 
of gross mishandling of the significant large scale tidal forces. 
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Figure 5. SZ maps for the ITP comparison cluster seen at z = subtending the 
angles shown. The observing wavelength was taken to be in the Rayleigh- Jeans 
region of the spectrum, so AT/T = —2y here. The core dark regions interior to 
the white areas are above 32 x 10~ 6 ; the dark contours surrounding the white 
are at 2 x 10~ 6 , levels now accessible to ground-based instrumentation. At higher 
redshift, gaseous filaments bridge the subclusters which merge to make the final 
state. Even the far-field outskirts of these subclusters is observable, but precision 
below 10~ 6 would be needed to probe well the filaments. (See also Fig. 7.) 

the calculations shown, the high resolution region (grid spacing cll, 100 3 
sphere) sits within a medium resolution region (2cll, 80 3 ), and in turn 
within a low resolution region (Aa L , 64 3 ). The mean external tide on the 
entire patch is linearly evolved and applied during the calculation. 

2.2 Non-Gaussian Source Model: It is clear from Figs. 6 and 7 that 
the dominant signals are quite patchy if one is interested in amplitudes 
above a few [iK. This has the disadvantage for analysis that non-Gaussian 
aspects of the predicted patterns are fundamental, and an infinite hierarchy 
of connected iV-point spectra beyond the 2-point spectrum C( are required 
to specify them (as for defect models). However, it also seems reasonable 
that we could represent the results in terms of extended emission from 
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localized sources. This concentration of power is characteristic of many 
types of secondary signals. Indeed some, such as radiation from dusty star- 
burst regions in galaxies, can be treated as point sources since they are 
much smaller than the observational resolution of most CMB experiments. 
However, highly accurate emission from dusty galaxies is too difficult to 
calculate from first principles, and statistical models of their distribution 
must be guided by observations. 

Extended and point source models for the temperature fluctuations ex- 
press the signal as a projection along the line-of-sight of a 3D random field, 
Q(r,t), which is the convolution of profiles g(r\C,t) surrounding objects of 
some class C, with the comoving number density nc*(r,t) defining a point 
process for those objects [10, 28]. In [28, 29], the points were referred to 
as "shots" , from shot-noise, although of course continuous clustering of the 
shots as well as their uncorrelated Poissonian self-correlation has to be in- 
cluded. Simple derivations for the form of G(r,t) and g and the relation of 
the 2D power spectrum Ci to the 3D power spectrum of the Q (and hence 
of ne*) are given in [10]. 

For dusty star-burst regions, the shots are galaxies and the profile g 
involves the dust density and temperature [28, 29], while for the SZ effect, 
the dominant emission comes from clusters and groups and the profile g is 
a constant times the pressure [28, 10, 30]. 

Various attempts have been made to use hydro dynamical codes to do 
an entire region up to z = 2 or so. However, it should be clear from the 
discussion of the numerical setup that approximations are always needed 
to do this. One approach was to calculate the full pressure power spectrum 
from a single simulation, (\Q(k, t)\ 2 }, and appropriately project it to deter- 
mine the SZ Ci [22, 23]. A more ambitious approach was to tile the space 
with boxes out to some redshift [24, 25, 26, 27, 31], but because of computer 
limitations and resolution requirements, the boxes were relatively small and 
computationally expensive. To make predictions, tricks were required, e.g., 
"observing' translated and rotated versions of the few simulated boxes at 
many different output redshifts. 

2.3 The Shots and their Profiles: The shot model provides a way for- 
ward semi-analytically, by laying down pressure profiles around a catalogue 
of halos. The halos and their properties could be computed with iV-body- 
only simulations using much larger boxes than those required for hydro. 
Another approach is to use "peak-patches" , a method for identifying halos 
in the initial conditions of calculations which has been shown to be accu- 
rate for clusters [30]. For this approach, the entire space can be tiled with 
contiguous boxes that are smoothly joined, and have all the required long 
wavelength power to treat the really rare halo concentrations, or "super- 
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Figure 6. SZ maps derived from spherical presure-profilcs imposed upon halos 
identified up to z—2 with the peak-patch method, for three cluster-normalized 
cosmological models. The observing wavelength and contours are the same as in 
Fig. 5. The histograms show ACDM, oCDM and sCDM can be clearly differen- 
tiated. Blank field SZ surveys using interferometers and bolometer experiments 
promise to revolutionize our approach to the cluster system, especially at z > 0.5. 
The contributions from clusters below and above this redshift are shown on the 
right for the ACDM model. 

duper-clusters" , that can appear. 

What to do for the pressure profiles, p(r\C, t), is of course debatable. One 
strategy is to use profiles calculated from hydro simulations, but what has 
invariably been done is even simpler: pressure profiles scaled by bulk halo 
properties of the observed form derived from low redshift X-ray cluster 
data (and extrapolated to the more poorly known high redshift regime). 
For example, adopting a spherically symmetric isothermal "beta" profile 
is common: p(r\C,t) = p c (l + r 2 Irl^^yWlHiBe* - r), where p c is the 
central pressure, r core * is a core radius, and is an outer truncation 
radius exterior to which the local pressure contribution is taken to be zero, 
and interior to which i? is unity. The radius r here is comoving and the * 
subscript denotes comoving radii. (3 « 2/3 is a reasonable fit to the X-ray 
data. 

This rapid semi-analytic peak-patch method for simulating clusters us- 
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ing isothermal beta-profiles was used in the construction of the two SZ 
layers (thermal and kinetic) of the sandwich in Fig. 2. The panels of Fig 6 
show peak-patch-derived SZ maps for three cosmologies, and the lower right 
corner shows the distribution of AT/T in pixels, with its non-Gaussian dis- 
tinct tails. The right two panels contrast the contribution for the ACDM 
example of clusters below z = 0.5 with those above. The top panel of 
Fig. 7 shows the ACDM example with a filter scale appropriate for the 
Planck satellite resolution and with an ideal resolution for the cluster sys- 
tem, below two arcminutes, which interferometers and large single dishes 
can achieve. However, interferometers typically filter the low £, losing large 
scale power. The lower panel of Fig. 3 shows the SZ Cg spectrum derived 
for the flat ACDM simulation shown, contrasting the contributions from 
clusters and groups above z = 0.5 with all of them. Both Poissonian and 
clustering contributions are included, since the simulation has them. 

The power is not distributed in a democratic fashion as it is for Gaussian 
theories, but is concentrated in cold spots below 218 GHz, and in hot spots 
at higher frequencies. Thus, the naive visual comparison with bandpowers 
derived using a Gaussian assumption about the distribution of power is 
misleading. 

The assumption that the SZ effect is dominated by the high pressure 
regions associated with clusters and groups of galaxies, with only weak mod- 
ifications coming from the intercluster medium, is borne out visually in the 
supercluster simulation. The emission from the gas in filaments outside of 
the groups and poor clusters that reside in them is weak, below observable 
levels. What we can also see, however, is that an anisotropic pressure distri- 
bution is expected, with elongation along the filament directions, inviting 
improvements over spherical approximations. 

To properly analyze the SZ effect in ambient fields, having fast Monte 
Carlo simulation methods along the lines developed here is clearly essential. 
That does not take away from the challenge of how best to do the analysis of 
such distinctly non-Gaussian signals, which consist of extended rather than 
point-like sources. Even with relatively small beams, these will be somewhat 
confused by overlapping sources. At least with the cluster system, we may 
hope to model the individual elements with hydrodynamical simulations. 
For other secondary sources such as dusty galaxies, a priori theoretical 
models are not really feasible, and the properties of the sources will have 
to be derived from the observations. At least in some of those cases, for 
typical beam sizes, the emission can be treated as from point sources. 

2.4 Foreground Complications: The foreground signals from the inter- 
stellar medium are also non-Gaussian, but are not well-modelled by ex- 
tended sources, since their emission power spectrum Cg has been shown to 
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Figure 7. SZ maps subtending the angles shown for peak patches (above) and 
a single supercluster region seen at redshift 0.5. The core dark regions interior 
to the white areas are above 48 x 10~ 6 , and the dark contours surrounding the 
white are at 3 x 10~ 6 at 750/xm, levels that can be achieved with blank field 
SZ bolometer experiments. Filamentary bridges between the clusters are typically 
below 10~ 6 , but the far-field of clusters can be probed. The bottom left panel 
passes the supercluster region at z = 1 in the right panel through a simulated 30 
GHz HEMT-based interferometer with the indicated characteristics. 
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rise to lower £; further, no simplifications such as statistical isotropy apply. 
Direct hydrodynamical studies may increase our understanding of the ISM, 
but will not be able to provide strong guidance on statistical comparisons 
with data. The use of other data sets will clearly play a strong role in the 
final analysis; e.g., templates from the IRAS+DIRBE data is useful for 
modelling the distribution of shorter wavelength dust radiation. Exactly 
what to do for the analysis of foregrounds is under intense study and some 
current ideas are described in § 3. 

3. Acting on the Data 

The data comes in as raw timestreams, which must be cleaned of cosmic 
rays, obvious sectors of bad data, and other contaminants. Encoded in it 
are the sky and noise, as well as unwanted instrument, atmospheric and 
other residual signals. The pointing matrix identifies time-bits to a chosen 
pixel basis and allows the spatial signal to be separated from the noise; 
the resulting map completely represents the data if the pixelization is fine 
enough relative to the experiment's beam. At one time, the cosmology was 
drawn from the maps, perhaps using new bases beyond the spatial pixel 
ones, and possibly truncated to compress the data into more manageable 
chunks (e.g., using signal-to- noise eigenmodes to get rid of informationless 
noisy components of the map, [11]). 

The statistical distribution of various operators on the pixelized data 
can be estimated from the map. In particular, the power spectrum Ci repre- 
sents a radical compression of the data, but of course is no longer a complete 
representation of the map. However, under the assumption that the sky 
maps only contain Gaussian signals characterized by an isotropic power 
spectrum, this is a complete representation of the statistical information 
in the map when the binning of the bandpowers is fine enough. For such 
Gaussian theories, such as those usually arising in inflation, cosmological 
parameters can be quickly estimated directly from the power spectrum. The 
radical compression has allowed millions of theoretical models to be con- 
fronted with the data in the large cosmological parameter spaces described 
above, e.g., [32], a situation infeasible if full statistical confrontation of all 
the models with the maps was required. 

Basic aspects of the Boomerang, Maxima, DASI and CBI pipelines are 
described in [33], [35, 37, 4, 38], [36, 39], [9] and [8]. Pipelines were also de- 
veloped for DMR and for QMAP. The MAP pipeline as currently envisaged 
is described in [40]. The Planck pipeline is still very much under discussion 
and development, complicated by the necessity of delivering high precision 
component maps and derived parameters from such a huge volume of data. 

3.1 Bayesian Chains: The data analysis pipeline can be viewed as a 
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Bayesian likelihood chain: 

V (parameter s\data, theoretical framework) 
= V(data timestreams\maps, noise) 
®V (noise\noise — obs) 
®V(maps\signals) 

®V '(signal s\parameters) 
®V (parameter s\prior knowledge) 
<g>[l/V(data\theoretical framework)} (1) 

Each conditional probability in the unravelling of the chain is very complex 
in full generality. This means the exact statistical problem is probably not 
solvable, and only applying simplifying approximations to sections of the 
chain have made it tractable. Of particular widespread use has been the 
assumption that both noise and signals are Gaussian-distributed; others 
are being explored, especially by Monte Carlo means, but much remains to 
be done for us to attain the high precision levels that have been forecast. 

We now describe the various terms, beginning with the last two. The 
priors V (parameter s\prior knowledge) for the cosmological parameters are 
often taken to be uniform, but can represent information from other exper- 
iments, e.g., LSS observations, prior CMB experiments, age or Hq con- 
straints. The hope is that the new information from the rest of the prob- 
ability chain will be so concentrated that the precise nature of the prior 
choice will not matter very much. As we have seen, prior information from 
non-CMB experiments is crucial, even given the most ideal CMB experi- 
ment, in order to break near-degeneracies among cosmological parameters. 

The quantity V(data\theoretical framework) is an overall normaliza- 
tion. It can address whether the overall theory is crazy as an explanation 
of the data. It has some characteristics in common with goodness-of-fit 
concepts in "frequentist" probability analysis. 

We often refer to entropies S = InV in the Gibbs sense so the Bayesian 
chain involves a sum of the individual conditional entropies that make up 
the whole. 

3.2 Signal/noise separation: The information comes in the form of a dis- 
cretized timestream, with d c t as the data for the bit of time t in frequency 
channel c, and a pointing vector on the sky, q c (t). The sky is gridded into 
pixels, and these are used to define signal maps which are functions of di- 
rection and frequency. We denote these by A cp , for the signal at pixel p 
and frequency channel c. The pointing matrix, P c t p , is an operator map- 
ping time-bit t to pixel p with some weight. The difference d — -PA is the 
noise, n, plus further residuals 1Z, which may be sky-based or experimental 
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systematics. The probability of a given timestream can be written as 

V(d\A, r), P ctp ) = J] 5{d ct - PcnAcp + K ct + net))- (2) 

ct p 

3.2.1 Map-making: Map-making for a given channel involves using just 
the information encoded in d and P to separate what is sky signal (and 
residuals) from what is noise, i.e., the construction of V(A\d). 

The pointing matrix is a projection operator, with many more time- 
bits than pixels. The simplest example for P ctp is the characteristic set 
function for the pixel Xp(Qc(t)) (one inside the pixel, zero outside). For a 
chopping experiment such as MAP, the two pixels where the two beams 
are pointing (separated by 141°) are coupled at a given time-bit. There are 
also choices to be made about scattering the information in the timestream 
among pixels. For example, one can include the beam in A or in P. At 
the moment the former is preferred, but there are good arguments for the 
latter if the beam changes in time (e.g., asymmetric beams with rotations). 
In that case, Xp(Qc(t)) is replaced by A p B(q p — q c (t)), where B is the beam 
function and A p is the pixel area, and the derived signal A cp does not have 
the beam function convolved with it as it does in the "top hat" Xp case. Of 
course, any other function normalized to the pixel area upon integration 
can be used, e.g., approximations to the beam which might not contain all 
sidelobe information. As long as the signals are appropriately convolved, 
and the pixelization choices are fine enough, the results should be identical. 
Pointing uncertainty further complicates the simplicity of P. 

In map-making, there is an implicit assumption on the prior probability 
of the map, V(A\prior), that it is uniform so any value is a priori possi- 
ble. The map distribution should involve integration over all possible noise 
distributions; in practice, only the measured noise is used, which delivers 
the maximum likelihood map A cp and a pixel noise correlation CV iCpc /p', 
allowing for possible channel-channel correlations in the noise. 

For many previous experiments, one could just compute the average A 
and variance Cat of the measured d's contribution to each pixel, relying on 
the central limit theorem to ensure a Gaussian-distribution. However, these 
"naive maps" , described below in more detail, also relied on an implicit as- 
sumption, namely that r] was uncorrelated from time-bit to time-bit, which 
is not correct with the 1/f noise in detectors. The noise was also often so 
strong compared with the signal that the latter could be ignored in solving 
for A cp ; this is no longer the case. 

So far it has been essentially universal to assume that i] is Gaussian- 
distributed stationary noise with noise power w~ l (u) a smooth function. 
Thus, the noise is completely specified by its covariance c„ irfiC / t / = S[(r] ct r] c / t >)], 
which is assumed to be a function of t — t', hence diagonal in the time 
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frequencies u>. Here, S denotes the smoothing operation. Drifts and non- 
Gaussian elements can be included with timestream filtering and in the 
catch-all residual 1Z using time templates (see below). Ignoring 1Z for the 
moment, the probability of the time-ordered data given a map and noise- 
power w -1 , V(d\A,w), is just a Gaussian: 

V(d\A,w) = exp[-(d - PA)^w(d - PA)/2]/{(2Tr) Ntb ^/ 2 ^det(w- 1 )} 
= exp[-(A - A)*W N (& - A)/2]/{(27r)^-/ 2 v /det(^ 1 )} 
x exp [-rfwfj / 2] /{ (2vr) (NthUa ~ N ^/ 2 y'det (w' 1 ) } , (3) 

W N A = P^wd, W N = P^wP = C N l = (nn^f 1 , (4) 

n = A - A = C N P^wr), fj = (rj\d) = d - PA, 

w = w(l- PiP^wP^P^w), P^wfj = 0. (5) 

The maximum likelihood solution for w(cu) is (5[|FT(d — PA)) 2 ])" 1 , where 
FT denotes the Fourier transform operator. Note that the fj noise term is 
multiplied by a projector which is perpendicular to the spatial sector; i.e., 
CnP^wv=0- The first Gaussian in eq.(3) is V(A\A). We can interpret it as 
the integration over the spatial Gaussian random noise field n defined in 
eq.(4) of the product of "P(A|A,n) = 5(A — (A + n)) and the distribution 
of n, exp[-ntC^ 1 n/2]/[(2vr) Ar p-/ 2 [detCj V ] 1 / 2 ] . 

3.2.2 General cf. Optimal Maps: Any linear operation from timestream 

to generalized pixels, A^ M ^ = Ai^d defines a map, albeit not an opti- 
mal one as for the maximum likelihood solution eq.(4). The map-noise 
n (M) = A (M) - A( M ) has a correlation matrix = M^w~ l M. Here 

the relation of the true sky signal seen in the .M-filter-space to the "optimal 
map" signal A is A^ M ^ = Ai^ PA. Of special interest are therefore classes 
of operators At for which Ai^P = P^Ai is a projector, i.e., self-adjoint and 
equal to its square. For example, Ai^ = (P^ w^P) -1 P^ w* satisfies Ai^P = I, 
the identity, for any w* provided (P^w^P) is invertible. The price one pays 
for such a non-optimal map is enhanced noise, with noise-weight matrix 
= (P^w if P)(P^w,w~ 1 w,P)~ 1 (P^w if P) which has maximum eigen- 
values if w * = w, but is otherwise "less weighty". 

An interesting example is when w* is taken to be a constant and Pt p is 
zero or one depending if the pointing at a time-bit t lies within the pixel p 
or not; in that case P^ P just counts the number of time-bits that fall into 
each pixel, and A^^ is just the average of the anisotropics over these. If we 
use rjtrjt' in place of its smoothed version w^ 1 , the noise matrix just counts 
the variance in the amplitudes of the pixel hits about the mean. For this 
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reason, 

(ptp)-ipt^ i s ca ll e d a "naive map" [4]. For other cases, computing 
the error matrix may not be simple if w is not known. 

For the full maximum likelihood map, even with stationarity, the con- 
volution of w with d appears to be an 0(Nf bits ) operation, but it is really 
O(Ntbits) because w(t — t') generally goes nearly to zero for t — t' S> 0. Sim- 
ilarly, the multiplication of the pointing matrix is also 0(N t uts) because of 
its sparseness. Thus, we can reduce the timestream data to an estimate of 
the map and its weight matrix in only 0(N? ix ) operations. 

3.2.3 Iterative Map-making Solutions: An iterative method [35] has 
proved quite effective to solve this: the map on iteration j + 1 is esti- 
mated from W*A(j + i) = W^A^ + P^w^rj^, where the noise and noise 
power on iteration j are determined from r/^ = d — PA^, w7}(uj) = 

S[\FT(r](j})\ 2 (Lu)]. Here W* = P^w*P is a matrix chosen to be easy to 
invert: e.g., a constant w* was used for Boomerang [35]. As the maps con- 
verge, the noise orthogonality condition holds, so the solution is the correct 
one. What w(uj) looks like is white noise at high temporal to, crashing to 
zero because of 1// noise in the instruments and time-filtering. These low 
frequencies correspond to large spatial scales, and the iteration converges 
slowly there, inviting multigrid speedup of the algorithm, which is now be- 
ing implemented [37]. The final A and CV are computed using d and the 
converged in eq.(4). In some instances, we can work directly with WnA 
and Wn rather than A and CV, avoiding the costly 0(N^ ix ) inversion. If 
sections of the map are cut out after Wn determination, through a projec- 
tor Xcut which is zero outside and one inside the cut, the cut weight matrix 
is (XcutW^ 1 Xcut)" 1 ■ This corresponds to integration over all possible values 
of the now unobserved A's outside of the cut, which increases the errors 
on the regions inside because of the correlation. Unfortunately, two matrix 
inversions are required, and the first one is potentially quite large even if 
the cut is severe. 

Hinshaw et al. [40] describe the current pipeline for MAP. To a good 
approximation the MAP noise is white, so wij\ = w* can drop out, and 
the critical element is computation of _P T P, which is more complex than 
counting pixel-hits because MAP is a difference experiment. The MAP 
team choose W* to be diag(P^w*P), but otherwise use the same iterative 
algorithm. 

Since P is all that differentiates signal from noise in the timestream, 
it is essential that it be sufficiently complex; in practice this means many 
cross linkages among pixels in the scan strategy. This is because there are 
random long term drifts in the instruments that make it hard to measure the 
absolute value of temperature on a pixel, though temperature differences 
along the path of the beam can be measured quite well because the drifts 
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are small on short time scales. If there are not sufficient cross- links, the 
maps are striped along the directions of the scan pattern, and the weight 
matrix Wn has to be called upon to demonstrate that these features are 
not part of the true sky signal we are searching for. The rapid on-board 
differencing for MAP effectively eliminates this. 

3.3 Pixelization: Pixels are any discrete basis that give a complete rep- 
resentation of the data. Square top-hat functions tiling the space in a grid 
quite a bit smaller than the beam size were the norm in the past. As the size 
of the data increased, more elaborate hierarchical schemes were needed. For 
COBE, the Quadrilateralized Spherical Cube was used: the sky was bro- 
ken into six base pixels corresponding to faces of a cube. Higher resolution 
pixels were created by dividing each pixel into four smaller pixels of approx- 
imately equal area, in a tree with pixels that are physically close having 
their data stored close to each other. There are (quite) small errors associ- 
ated with the projections of the sphere onto the faces. At resolution r, there 
are 6 x 2 2 ( r ~ 1 ) pixels. The COBE data came with resolution r = 6, 6144 
pixels of size 2.6°, and we sometimes use resolution 5, still safely smaller 
than the 7° beam. 

For large sky coverage, it is beneficial to have a pixelization which 
is azimuthal, with many pixels sharing a common latitude, to facilitate 
fast spherical harmonic transforms (FSHTs) [41] between the pixel space, 
where the data and inverse noise matrix are simply defined, and multi- 
pole space, where the theories are often simple to describe. The FSHT 
operation count is a O(N^) operations, with 0{N p i x In N^ ix ) potentially 
possible to achieve. HEALPix[42] is an example which has been adopted 
for Boomerang, MAP and Planck: it has a rhombic dodecahedron as its 
fundamental base, which can be divided hierarchically while remaining az- 
imuthal. There is also an extensive software package that accompanies it 
to allow manipulations. See also [43] for the alternative hierarchical "igloo" 
choice, with many HEALPix features, and some other advantages. 

In some cases generalized pixels are used, involving direct projection of 
the data onto spatially-extended mode functions. The first extensive use 
of this was for the SK95 dataset. Signal-to-noise eigenfunctions used in 
data compression for COBE and SK95 are another example [11, 14]. Direct 
projection onto pixels in wavenumber space, with the mode functions being 
discrete Fourier transform waves, also form a fine alternative basis. Still 
there is nothing like a real-space visual representation to explore strange 
features in the data. 

3.4 Filtering in Time and Space: Getting rid of unwanted systematic 
effects in the data has been fundamental throughout the history of CMB ex- 
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periments. Sometimes this was done in hardware, e.g., rapid Dicke switching 
from one direction in the sky to another in "two-beam" chop experiments 
and "three-beam" chop-and-wobble experiments; sometimes it was done in 
editing, e.g., bad atmospheric contamination, cosmic rays; and now it is 
invariably done in software, e.g., 1// noise in bolometers or HEMTs, spu- 
rious spin-synchronous signals associated with rotating platforms. In the 
linear operation J2t' 3~tt'dt' ° n the timestream, the filter matrix Ttt' is often 
time-translation invariant, hence diagonal in frequency. High pass filters 
are an example, translating through the time-space pointing matrix P tp 
into primarily low £ spatial filtering, with the penalty the removal of part 
of the target sky signals as well as the unwanted ones. 
3.4.1 Spatial Filters and Wavenumber Space: In direct filtering of 
the maps, the signal amplitude A p in a pixel p can be written in terms of 
generalized pixel-mode functions J- p / m and the spherical harmonic compo- 
nents of the anisotropy, a£ m : A p = ^em^p/matm- Fp,lm encodes the basic 
spatial dependence (e.g., Y( m (q p )) and filters, including the experimental 
beam Be m and pixelization effects at high £, and the switching strategy at 
low i. 

A Fourier transform representation of the signals in terms of wavenum- 
ber Q is often justified. There is a projection from the sphere with coordi- 
nates (9, 4>) onto a disk with radial coordinate w = 2 sin(0/2) and azimuthal 
angle (j) which is an area-preserving map and one-to-one — except that one 
pole gets mapped into the outer circumference of the disk. It is highly dis- 
torted for angles beyond 180°, but even for the 140°-diameter COBE NGP 
and SGP cap maps of Fig. 8 it is an excellent representation. The associated 
discrete Fourier basis for the disk is close to a continuum Fourier basis Q, 
with |Q| =£+1/2. 

Instead of a cap,_consider a rectangular patch of size L x L. The am- 
plitude A p = ^Q^>(Q) a Q then involves a mode-sum with differential 
element L 2 d 2 Q/(2-7r) 2 = f s k y 2QdQd(f)/(2ir), where the fraction of the sky 
in the patch is f s k y = L 2 /Air. The effective number of modes contributing 
to t is therefore g# = f s ky(^ + 1)> the usual 21 + 1 in the f s ky=^ all-sky 
case. Of course the reduction by just f s k y is approximate, since modes with 
wavenumber below the fundamental one of 2ir/L hardly contribute, inviting 
a more sophisticated relation for g£ to characterize effects from filtering, dis- 
creteness and apodization (smooth weighting of the target region) [4]. The 
decomposition of the spatial-mode function in this "momentum space" now 
involves a Fourier phase factor associated with the pixel position instead of 
a Yira-, the beam £>(Q) and the rest, U P (Q), which includes discretization 
into time bins and pixelization as well as the switching-strategy and any 
long wavelength filtering done in software. (In the spherical harmonic de- 
composition, U p ^ mm i is a function of vn! as well as the Im and possibly the 
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pixel position.) 

Examples of U P (Q) are given in [10]. For switching experiments through 
a throw angle w t hrow, U P (Q) is a first power of 2sm(Q-w t f irow /2) if a 
chop, a second power if a chop-and-wobble. A separable form U p (Q) = 
n(w p \wc)u(Q) , involving a spatial-mask function [i centered at some point 
mc and a position-independent filter u(Q), is appropriate for the recent 
Boomerang analysis [4], with ii(Q) a combination of temporal and spatial 
filters and [i unity inside an ellipsoidal region, zero outside it. 

Beams B C (Q) for each channel c must be determined experimentally, 
typically through the patterns of point source on the sky, but also through 
detailed "optics" computations of the telescope setup. Most often there 
is a nice monotonic fall-off from the central point to relatively low levels 
of power, though there is invariably at least some beam-asymmetry and 
lurking side lobes. In the past, it was common practice to use circularly- 
symmetric monotonically-falling B^s in CMB analysis, even "circulariz- 
ing" pixelization effects, but now highly accurate treatments are needed to 
achieve our goal of great precision. The full width half maximum 9f w hm 
is the usual way to quote beam scale. If a Gaussian approximation to the 
profile is appropriate, the Gaussian smoothing scale is £ s 810(10 / /^/ w / lTn ) 
in multipole space. 

3.4.2 Signal Correlation Matrices and Window Functions: For isotropic 
theories for which the spectra Cte are only functions of £, the pixel-pixel 
correlation function of the signals can be expressed in terms of N p i x x N p i x 
"Q-window" matrices W pp '/: 



Ct pp ' = (ApAp/) = I[W pp '/C T e], W pp >, 



(7) 



It is expressed in terms of 1(f), the discrete "logarithmic integral" of a 
function /. The average We = J2 p =i Wpp//N p i x is often used to charac- 
terize the ^-sensitivity of the experiment, since \Jz[WeCTe] gives the rms 
anisotropy amplitude. For the simple case of the COBE map we have 
Wpp',e = W nPe(cos9 pp i) in terms of the Legendre polynomials for i > 2, 
in which both beam and pixelization effects are taken into account in Wj>. 
In general, the implicit isotropization that makes this only a function of 
£ does not work at high £ because of the pixelization, and this can sig- 
nificantly complicate the treatment. Switching and other spatio-temporal 
filters applied in software or hardware further complicate the expressions. 
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3.5 Templates in Time and Space: Templates are specific temporal 
patterns that we probe for in the timestream, v c t,A- Templates could have 
frequency structure (c) as well. They are often associated with specific spa- 
tial patterns T cp ^A that we can probe for in the map or directly in the 
timestream with v c t t A = J2p Pctp^cp,A- The unknowns are the amplitudes 
ka (and the errors on them), and possibly other parameters characterizing 
their position, orientation and scale. The residual signal associated with 
the templates is therefore 1Z c t = J2a v ct,A^A- Templates can model con- 
tamination by radiation from the ground, balloon or sun, components of 
atmospheric fluctuations and cosmic ray hits. Often they are synchronous 
with a periodic motion of the instrument. 

In maps, spatial templates have been used in the removal of offsets, 
gradients, dipoles, quadrupoles and looking for point sources and extended 
sources (e.g., IRAS/DIRBE structures). Sample forms for T cp are: a con- 
stant for an average offset, q p i for the three dipole components, q p %q p j — S^ /2 
for the five quadrupole components, the pattern of dust emission in the 
IRAS/DIRBE maps, and the beam pattern for point sources, T cp = B c (q p — 
Qsource)- Even the individual pixels themselves define spatial templates. 

Since v c t t A is of the same mathematical form as P c t p we can solve si- 
multaneously for A cp and ka, delivering an extended map A and R, with 
appropriate error covariances for each, including a cross term. The assump- 
tion is that the a priori probability for the template amplitudes k is uniform, 
as for the map amplitudes. 

Often we just want to remove an undesired pattern from the map. We 
can do this by directly projecting it from the map, and modifying the map 
statistics. One can also marginalize the unobserved ka, i.e., integrate over 
them. A nice way to deal with this is to assume that the prior probability 
for k is Gaussian with zero mean and correlation Kaa' = (ha^A') ■ Letting 
the dispersion become infinite reduces to the projection. The explicit result 
of marginalizing over the template amplitudes yields modified noise weights 
and a modified average map: 

w = w — wv{v^wv + K~ 1 )~ 1 v' f w = (w^ 1 + vKv^)^ 1 , (8) 
W N A = P^wd, W N = P^wP, C N = W N l . 

Letting the eigenvalues of K — > oo gives the uniform prior, and in that case 
the modified w has the template vectors projected out, i.e., wv=0. 

The effect of marginalizing out k is therefore to enhance the noise. One 
price to pay is that time-translation invariance of the modified w is lost: i.e., 
w is not diagonal in temporal frequency u, and cannot be computed directly 
by the FFT, followed by smoothing. However, the relationship eq.(8) allows 
c n = w" 1 to be computed with one N K x N K matrix inversion in addition 
to the FFT. 
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3.6 "P(signals|parameters): The signal component of the map, 

S U)cp + X] ^cp,AKA , (9) 
3 A 

includes the frequency-independent primary, S(i) cp say, foreground and sec- 
ondary anisotropy components su\ cp and additional template-based signals 
^cpA K A- 

3.6.1 Gaussian Signals and Wiener Filters: For primary anisotropics, 
the signal is often assumed to be Gaussian, 

lnV(s\y) = - \s^Ct 1 s - ±Tr In C T - N pix In y/2w , (10) 

where Ct{d) = W^ 1 is the theory pixel-pixel correlation matrix and we 
have denoted the set of cosmological parameters it depends on by y a . (Tr 
denotes trace.) A great advantage of assuming all signals and noise to be 
Gaussian is that marginalizing all signal amplitudes J2 s (j)cp an d template 
amplitudes ka yields another simple Gaussian: 

]nV(K\y) = -^W t A+±TrlnW t -N pix \nV2^, (11) 

W t = (C N + Ct)- 1 , W t = {Cn + Ct)' 1 , Cn = C n + TKTt , 

W t = W t - [T^Wtl^K- 1 + T+WtTJ-^TtWi] . 

As for the time templates, the marginalization can be thought of as inducing 
enhanced noise in the template spatial structures. 

The other part of the probability chain, ln'P(A|A), gives a Gaussian for 
the signal given the observations, with mean the Wiener filter of the map: 

lnP(A|A) = -i(A - <A|A»tC A i(A - <A|A» 

-±TrlnC A A - Npi X ]ny/2^, 

(A|A) = C T W t A, (SASA^A) = C T - C T W t C T . (12) 

The brackets indicate an ensemble average. The lemma 

B(A + B)- 1 = (A' 1 + B- 1 )-^- 1 (13) 

between two invertible matrices A and B js convenient to remember. Thus 
the Wiener filter can also be written as (Wn + Wt)~ 1 Wn A. 

We can also determine the distribution, given the observations, of the 
various component signals and of the noise; all of these are Gaussians 
with means and variances given by: 

(s U) \A) = C T{j) W t A, (SsftSs^) = C nj) 5 jr - C TU) W t C T{r) , 

(n|A) = C N W t A, (5n5n^) = C N - C N W t C N , 

5s U) = S U) - ( s (i)P)' 5n = n- (n|A). (14) 
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3.6.2 Non-Gaussian Signals and Maximum Entropy Priors: For 

foregrounds and secondaries, all higher order correlations are needed since 
the distributions are non-Gaussian. One sample non-Gaussian prior distri- 
bution that has been analyzed for these signals and even for primaries is the 
so-called maximum entropy distribution, which has been widely used by ra- 
dio astronomers in image construction from interferometry data. Although 
"max-ent" is often a catch-all phrase for finding the maximum likelihood 
solution, the implementation of the method involves a specific assumption 
for the nature of V(s/j\ cp \theoTy). For positive signals, s > 0, it is derived 
as a limit of a Poisson distribution, P n = ^ ra e _M /n!, in the Stirling formula 
limit, 

In n! = T(l + nQ « fi( ln(K) - K + hi(2^<)/2 : 
InPHi/) = (-CMC) +C-1)M where C = *//*■ (15) 

The subdominant ln(27rjuC) term is dropped and the Poisson origin is largely 
forgotten, replaced by an interpretation involving the classic Boltzmann 
entropy, — Chi£, with the constraint on the average (£) = 1 and (i now 
interpreted as a measure. 

Another example of positive signals is standard emitting sources. These 
often have power law distributions over some flux range, with leading term 
InV = — 7ln£. These have to be regulated at small and/or large £ to 
converge. 

For symmetric positive and negative signals, a possible form is 

S P (s {j) ) = lnV(s {j) \y) = -x \n(x + y/l + x 2 ) + y/l + x 2 - 1, 
x 2 = s^Cf Jjsy), (16) 

which reduces to — x 2 /2 in the small fluctuation (small x) limit, but has non- 
Gaussian wings. This form has been used in CMB forecasting for Planck by 
[44, 47], but any form which retains the basic —x 2 /2 limit for small x and 
a levelling off at high x can have a similar effect. For example, instead of 
the — arcsinh(x) for dlnP/dx, the more drastic deviation from Gaussian, 
— arctan(x), has been used in radio astronomy. 

Of course, in spite of the analytic form for Sp, the integration over the 
various S(j) to get S(A\y) does not have an analytic result. Nonetheless, 
iterative techniques can be used to solve for the maximum entropy solution, 
and errors can be estimated from the second derivative matrix of the total 
entropy. 

3.7 Cleaning and Separating: Separating foregrounds and secondaries 
from the primary CMB into statistically accurate maps is a severe challenge. 
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It has been common to suppose s^ cp = f(j)c s (j)p is separable into a product 
of a given function of frequency times a spatial function. But this is clearly 
not the case for foregrounds, and for most secondary signals, although for 
some, such as the Sunyaev-Zeldovich effect, it is nearly so (Fig. 1). Another 
approach which is crude but reasonably effective is to separate the signals 
using the multifrequency data on a pixel-by-pixel basis, but the accuracy 
is limited by the number of frequency channels. 

3.7.1 Gaussian cf. Maximum Entropy Component Separation: It is 

clearly better to incorporate the knowledge we do have on the srj\ cp spatial 
patterns. This has been done so far by either assuming the prior probability 
for 'P(s(j) cp |theory) is Gaussian or of the max-ent variety, or is an amplitude 
times a template. Even the obviously incorrect Gaussian approximation for 
the foreground prior probabilities has been shown to be relatively effective 
at recovering the signals in simulations for Planck performed by Bouchet 
and Gispert [45]; see also [46]. 

The Poisson aspect of the maximum entropy prior makes it well-suited 
to find and reconstruct point sources, and more generally concentrated ones: 
for Planck-motivated simulations, it has been shown to be better at recov- 
ery than the Gaussian approximation [47]. Since errors are estimated from 
the second derivative matrix, non-Gaussian aspects of the errors are ig- 
nored. Further, foregrounds and secondary anisotropics have non-Gaussian 
distributions which are certainly not of the max-ent form, and can only be 
determined by Monte Carlo methods, simulating many maps — and then 
only if we know the theory well enough to construct such simulations, a tall 
task for the foregrounds. Fortunately, with current data these issues have 
not been critical to solve before useful cosmological conclusions could be 
drawn, but much more exploration is needed to see what should be done 
with separation for the very high quality Planck and MAP datasets. 

3.7.2 Component Separation with Templates: For foreground re- 
movals, maps at higher or lower frequencies than the target one, where the 
foregrounds clearly dominate, can be used as the templates. The marginal- 
ized formulas given in §3.5, 3.6 get rid of the templates but their am- 
plitudes and distribution may also be of great interest. The other part 
of the likelihood (before marginalization) is IiiV(k\A); with the Gaussian 
prior assumption, it is a Gaussian distribution with the mean a weighted 
template-filtering of the data, akin to a Wiener-filtering: 

(k\a) = [K- 1 + r^w t r]- 1 r^Wt'K -»• [r^w t r]- 1 r^w t 'K, 

C KK = (5k5k^ I A) = [Ttj^T]- 1 , 5k = k (k\A). (17) 

An example [49] of the use of templates was comparing Wiener maps 
and various statistical quantities for the SK95 data under the hypothesis 
that it was pure CMB or CMB plus a "dust template" - a cleaned IRAS 
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100/um map with strong input from the DIRBE 100 and 240 fim maps 
[48]. A correlated component in the 30-40 GHz SK95 data was found [49], 
but at a low level compared with the CMB signal, i.e., not much dust 
contamination, in agreement with what other methods showed [50]. 
3.7.3 Point Source Separation with Templates: A useful method 
for finding localized sources is to look for known or parameterized non- 
Gaussian patterns with the position, and possibly orientation and scale of 
the templates being allowed to float. An example is (optimal) point source 
removal, in which the map is modelled as a profile times an amplitude k 
plus noise plus the other signals present [51]. As the source position varies, 

— 1/2 

eq.(17) defines a 7c-map. The dimensionless map v = C KK 7c gives the num- 
ber of a a signal with the beam pattern shape has. If \p\ is small, e.g., below 
2.5<7, it will be consistent with the Gaussian signals in the map; if large, 
e.g., above 5a, it will stick out as a non-Gaussian spike ripe for removal. 
The question then is how many a one cleans to, and what impact setting 
such a threshold has on the statistics of the cleaned map, A' = A — T(re| A) 
with its corrected weight matrix W[ = W t — W t TC K T^W t . (Actually W[ 
always acts on A, so removing T(k|A) is not necessary in the limit of a 
wide prior dispersion for k.) 

Fig. 8 shows an application of this method to DMR. A comparison is 
also made between the mean noise field and the 7c field: they are quite 
similar. 

A similar optimal filter was used in [52] for ideal forecasting and in [53] 
for SZ source finding (based on the beta-profile). Another approach is to not 
pay such attention to the detailed optimal filter, but use a Gaussian-based 
filter with variable scale to try to identify point sources. [54] used a Mexican 
hat filter, i.e., the second derivative of a Gaussian. This continuum wavelet 
method is optimal only in the case of a scale invariant signal, a Gaussian 
beam and no noise. In the pure white noise limit with no signal, the straight 
Gaussian is optimal. The best choice lies in between. 

3.8 Comparing: The simplest way to compare data sets A and B is to 
interpolate on overlap regions dataset A onto the pixels of dataset B. This 
of course will not always be possible, especially if the pixels are generalized 
ones, e.g., modal projections such as in the SK95 dataset. We would also 
like to compare nearby regions even if overlap is only partial. This requires 
an extrapolation as well as interpolation. In [55], a simple mechanism was 
described based upon Bayesian methods and a Gaussian interpolating the- 
ory for comparing CMB data sets. The statistics of the two experiments 
is defined by a joint probability for the two datasets, V(A, B\Ct), where 
the theory matrix Ct has AB as well as AA and BB parts connecting it. 
A probability enhancement factor [3 = In V(A\B,Ct)/V(A\Ct) was intro- 
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Figure 8. The upper left panel shows the unfiltered DMR map (combined 
53+90+31 GHz data) of a 140° diameter region centered on the South Galac- 
tic Pole, with the contours indicated, (b) and (d) show the Wiener-filtered signal 
and noise maps, while (c) shows the fluctuations of height vo, with \v\ > 1.5. The 
lower left panel shows the same for the North Galactic Pole. Note the similarity 
between (c) and (d), that is that the Point Sources found at these low contours are 
consistent with random concentrations of noise, the logical conclusion within the 
statistics: there are no rare events that are good source candidates. We chose such 
a low cut so that one could see the effects of making 2 or 3a cuts just by counting 
in, each positive contour being 0.5 higher than the previous. The right panels show 
the contours of v rather than AT/T for the individual (A+B) frequency maps. 
A true source would be expected to persist in the three maps, as well as being 
in the combined one, the strength variation across channels depending upon the 
emission mechanism. Again there are no obvious hits. Apart from being a useful 
way to detect point sources, this is a nice mechanism for looking for anomalous 
local non-Gaussian patterns, to be investigated to determine if we are dealing with 
funny noise or true signals by appropriate follow-up investigations. 



272 



duced, which is invariant under A, B interchange. 

What was visually instructivewas to compare the A-data Wiener-filtered 
onto B-pixels, (sb\Aa) = CT,BAWt,AA^A, with the Wiener-filter of B-data, 
(sb\Ab) = CT,BBWt : BB^B- They should bear a striking resemblance ex- 
cept for the errors. The example used in [55] was SK95 data for A and 
MSAM92 for B, which showed remarkable similarities: indeed that the two 
experiments were seeing the same sky signal, albeit with some deviations 
near the endpoint of the scan. Although these extrapolations and interpo- 
lations are somewhat sensitive to the interpolating theory, if it is chosen to 
be nearly the best fit the results are robust. 

The same techniques were used for an entirely different purpose in [56], 
testing whether two power spectra with appropriate errors, in this case from 
Boomerang and Maxima, could have been drawn from the same underlying 
distribution. The conclusion was yes. 

3.9 Compressing: We would like to represent in a lossless way as much 
information as we can in much smaller datasets. Timestreams to maps (and 
map-orthogonal noise) are a form of compression, from Ntuts to N p i x . Map 
manipulations, even for Gaussian theories, are made awkward because the 
total weight matrix involves Cat, often simplest in structure in the pixel 
basis, and Ct, which is often naturally represented in a spherical harmonic 
basis. Signal-to-noise eigenmodes £k are a basis in which C n 1 ^ 2 CtC N 1 ^ 2 
is diagonal, hence are statistically independent of each other if they are 
Gaussian. They are a complete representation of the map. Another S/N 

basis of interest is one in which C^C^C^! 2 is diagonal, since C^ 1 comes 
out directly from the signal/noise extraction step. Finding the S/N modes 
is another 0{N^ ix ) problem. Typically, falls off dramatically at some 
k, and higher modes of smaller eigenvalues can be cut out without loss 
of information. These truncated bases can be used to test the space of 
theoretical models by Bayesian methods e.g., [64], and determine cosmic 
parameters. Further compression to bandpowers is even better for more 
rapid determination of cosmic parameters. 

The related compression concept of parameter eigenmodes mentioned 
in § 1.4 finds linear combinations of the cosmological or other variables y a 
that we are trying to determine which are locally statistically-independent 
on the multidimensional likelihood surface. It provides a nice framework 
for dealing with near-degeneracies. 

3.10 Power Spectra and Parameter Estimation: Determining the 
statistical distributions of any target parameters characterizing the theo- 
ries, whether they are cosmological in origin or power amplitudes in bands 
or correlation function in angular bins, is a major goal of CMB analysis. 
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For power spectra and correlation functions, it is natural that these would 
involve pairs of pixels. Operators linear in pixel pairs are called quadratic 
estimators. Maximum likelihood expressions of parameters in Gaussian the- 
ories also often reduce to calculating such quadratic forms, albeit as part of 
iteratively-convergent sequences. Hence the study of quadratic estimators 
has wide applicability. 

3.10.1 Quadratic Estimators: Estimating power spectra and correla- 
tion functions of the data has traditionally been done by minimizing a x 2 
expression, 

.2 w-mrnrrt 



X 2 = Tr[W (AA - C(y))W(AA ] - C(y))}/4, (18) 

where W is some as yet unspecified weight matrix. The 1/4 is from the 
pair sum. The critical element is to make a model C(y) for the pixel pair 

correlation AA^ which is as complete a representation as possible. For 
example, if we adopt a linear dependence about some fiducial C* = C(y*), 
i.e., C(y) = a + E /3 C'^/, then 

52r a f,6yf > = ±Tr[WC a W(Krf-C*)], C a = dC/dy a , (19) 



T aP = \Tx[WC a WCp\ - \Tx[WC a pW{AA ] - C*)], C af3 = ^-j , 

Fisher information matrix : F af3 = (T a p) = ^Tr[W C ' a W C p] . (20) 

Since J- a n has A dependence, this is not strictly a quadratic estimator. 
However, it is if the Fisher matrix F Q/ g, the ensemble average of F a p, is 
used. Iterations with F rather than T converge to the same y* if y is 
continually updated by the 5y. C* = Cn + Ct has been assumed. 

The correlation in the parameter errors can be estimated from the av- 
erage of a quartic combination of pixel values, 

(SySy^) = F-^TrlWCaWCWCpWCjf- 1 

+F- l \Ti[WC a W{5C)]TT[WC l3 W{5C)}F- 1 , 

where (SC) = C - C*. Note, (SySy^) = T~ x if C = C* and W = C" 1 . 

Another approach is to estimate the errors by taking the second deriva- 
tiveofx 2 , d X 2 /dy a dyf 3 = F aP : 

{d 2 X 2 /dy a dyp) = \Tr[WC a WC p ) - ±Tr[W C ' a pW (SC)} 
-► F a/3 as (SC) -► 0. 



These two limiting cases hold for the maximum likelihood solution as we 
now show. 
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3.10.2 Maximum Likelihood Estimators & Iterative Quadratics: 

The maximum likelihoodjsolution for a Gaussian signal plus noise is of the 
form eq.(19) with W = W t = (Cjy + Ct(v))~ 1 , the "optimal" weight which 
gives the minimum error bars. This is found by differentiating — 21n£ = 
Tr[W t AA f )] - Tr[ln W t ] + N pix ln(2vr). Of course the yp dependence of W t 
means the expression is not really a quadratic estimator. A solution proce- 
dure is the Newton-Raphson method, with the weight matrix updated in 
each iterative improvement Syp to y*p, at a cost of a matrix inversion. At 
each step we are doing a quadratic estimation, and when the final state has 
been converged upon, the weight matrix can be considered as fixed. Solv- 
ing for the roots of d In C /dy a using the Newton-Raphson method requires 
that we calculate d 2 In C/dy a dyp, the curvature of the likelihood function. 
Other matrices have been used for expediency, since one still converges on 
the maximum likelihood solution; in particular the curvature matrix expec- 
tation value, i.e., the Fisher matrix F a p, which is easier to calculate than 
T a p. Further, to ensure stability of the iterative Newton-Raphson method, 
some care must be taken to control the step size each time the yp are up- 
dated. Since the (Syadyp) estimate is just [^ r ~ 1 ]a / 3 ) this matrix must at least 
be determined after convergence to characterize the correlated errors. 

Note that only in the Gaussian and uniform prior cases is the integra- 
tion over stj\ analytically calculable, although maximum likelihood equa- 
tions can be written for simple forms that can be solved iteratively, e.g., 
maximum entropy priors. 

3.10.3 The Bandpower Case: The bandpower associated with a given 
window function ip^ of a theory with spectrum Cte is defined as the average 
power, 



with X defined by eq.(7). Many choices are possible for <p^, but the sim- 
plest is probably the best: e.g., Xb(£) which is unity in the ^-band, zero 
outside. Another example uses the average window function of the exper- 
iment, WVxftW; others use window functions related to relative amounts 
of signal and noise. There is some ambiguity in the choice for the filter ip^ 
[57, 58], but often not very much sensitivity to its specific form. 

We want to estimate bandpowers q b taken relative to a general shape 
rather than the flat shape: q b = IfabtOrt) /I(w4 S ') > so (Ob = Qb{C^) b - 
We therefore model AA^ by 



( c e)b = AVb&T^/lWbt], 



(21) 



C pp '(q) - C N , pp , + C b, P p'1 b = C *,pp' + H C b,pp' 6< l b i 



b b 



C b %> = l[Wp p/ , eX b(i)Ci s) ] , 5q b = q b -ql 



(22) 
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Here the q\ are the bandpowers on a last iteration, ultimately converging 
to maximum likelihood bandpowers if the relaxation is allowed to go to 
completion, 5q b — ► 0. The pixel-pixel correlation matrices Cxbrn 1 = C b s ].„,q b 
for the bandpowers b follow from eq.(6). 

The usual technique [14] is to use this linear expansion in the q b , the 
downside being that the bandpowers could be negative - when there is 
little signal and the pixel pair estimated signal plus noise is actually less 
than the noise as estimated from CV- By choosing ex.p(q b ) rather than q b 
as the variables, positivity of the signal bandpowers can be assured. (The 
summation convention for repeated indices is often used here.) 

The full maximum likelihood calculation has been used for bandpower 
estimates for COBE, SK95, Boomerang and Maxima, among others. The 
cost is (D(Np ix ) for each iteration, so this limits the number of pixels that 
can be treated. For N p i x > 2 x 10 5 or so, the required computing power 
becomes prohibitive, requiring 640 Gb of memory and of order 3 x 10 17 
floating-point operations, which translates to a week or more even spread 
over a Cray T3E with ~ 1024 900-MHz processors. This will clearly be 
impossible for the ten million pixel Planck data set. Numerical roundoff 
would be prohibitive anyway. 

The compressed bandpowers estimated from all current data and the 
satellite forecasts shown in Fig. 3 were also computed using the maximum 
likelihood Newton- Raphson iteration method (for rather fewer "pixels"). 
3.10.4 A Wavenumber-basis Example: There is an instructive case in 
which to unravel the terms of eq.(19). If we assume the noise and signal- 
shape correlation matrices in the wavenumber basis Q of § 3.4, Cn,qq> and 

(s) 1 

^Tb QQ' ' are functions only of the magnitude of the wavenumber, Q « £+ ^, 
then 

£ W = \Y.9QW\Q)C^ b {Q){C est {Q)-C N {Q)), (23) 

b' Q 

Fw = \Y.9QW\Q)C^ b {Q)C^ b ,{Q) , 
Q 

W(Q) = (C N (Q)+J2C? b (Q)qtr 1 C est (Q)^ |^|A Q | 2 . 

b 

Here gQ is the effective number of modes contributing to the AQ=1 wavenum- 
ber interval. 

In the all-sky homogeneous noise limit, this of course gives the correct 
result, with g Q = f sky (2£ + l) and C$(Q) = C ( e s) W e Xb^), where W t is the 
window (the angle-average of |n(Q)| 2 fi| in the language of § 3.4.1). Since 
Xb(£)Xb'(ty = 'WCOXbCOi Fbb 1 i s diagonal, with a value expressed in terms 
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of a band-window ipw. 

F bb = m = Mg Q /{2Q))W 2 {Q)cf ) wlxb^) • (24) 

It may now seem that we have identified up to a normalization the correct 
band- window to use in this idealized case, with a combination of signal and 
noise weighting. However the ambiguity referred to earlier remains, for what 
we are actually doing in estimating the observed q b is building a piecewise 

(s) 

discontinuous model relative to the shape C\ . The analogue for a given Cxe 

is the model J2e XbitylTbC^ • Since for the theory, q b = Z[<PbepTe]/Z[ t Pb£Pe > '*]i 
any ip^ satisfying (fbeXb'ity = fu wm do to have qxb the ratio of target- 
bandpower to shape-bandpower. 

The downweighting by signal-plus- noise power is just what is required 
to give the correct variance in the signal power estimate in the limit of 
small bin size: 

((AC Tb f) = Ftf(cP)l - l {CT ^f mf - 1 . (25) 

In the sample- variance dominated regime, the error is CTi\Z?Jgl>, while in 
the noise-dominated regime it is C^t\j2j gijYJ picking up enormously 
above the Gaussian beam scale. Recent experiments are designed to have 
signal-to-noise near unity on the beam scale, hence are sample-variance 
dominated for lower I. The interpretation of the (Cn£ + Cr^) -1 weighting 
factor is that correlated spatial patterns associated with a wavenumber i 
that are statistically high just because of sample variance are to be down- 
weighted by Cti- 

Although these results are only rigorous for the all-sky case with ho- 
mogeneous and isotropic noise, they have been used to great effect for 
forecasting errors (§3.11) for f s ^ y < 1 regions. 

3.10.5 Faster Bandpowers: Since the need for speed to make the larger 
datasets tractable is essential, much attention is now being paid to much 
faster methods for bandpower estimation. 

A case has been made for highly simplified weights (e.g., diagonal in 
pixel space) that would still deliver the basic results, with only slightly 
increased error bars [60]. The use of non-optimal weighting schemes has 
had a long history. For example, a quadratic estimator of the correlation 
function was used in the very first COBE DMR detection publication, with 
diag{WN) for W, a choice we have also used for COBE and the balloon- 
borne FIRS. Here one is estimating correlation function amplitudes C a , 
where a refers to an angular bin: C pp i = Cn pp i + ^2 a C a Xa{G pp ')-, where 
cos{9 pp i) = q p • q p i and Xa(&) is one inside and zero outside of the bin. As 
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for bandpowers, the distribution is non-Gaussian, so errors were estimated 
using Monte Carlo simulations of the maps. 

Any weight matrix W which is diagonal in pixel space is easy and fast 
to calculate. In [60] it was shown that making W the identity was adequate 
for Boomerang-like data. The actual approach used a fast highly binned 
estimate of C(6), which can be computed quickly, which was then lightly 
smoothed, and finally a Gauss-Legendre integration of C(9) was done to 
estimate (C) b . 

One can still work with the full W = Wt if the noise weight matrix 
C^ 1 has a simple form in the pixel basis. For the MAP satellite, it is 
nearly diagonal, and a method exploiting this has been applied to MAP 
simulations to good effect [61]. 

Since many of the signals are most simply described in multipole space, 
it is natural to try to exploit this basis, especially in the sample- variance 
dominated limit. In the Boomerang analysis of [4, 38], filtering of form 
n(w p )u(Q)B(Q) was imposed. Many of the features of the diagonal ansatz 
of § 3.10.4 in wavenumber space were exploited, complicated of course by 
the spatial mask fi, which leads to mode-mode coupling. The main approxi- 
mation was to use the masked transforms for the basic isotropic Q functions, 
expressing where needed the masked power spectra as linear transforma- 
tions on the true underlying power spectrum in £-space that we are trying 
to find. Monte Carlo simulations were used to evaluate a number of the 
terms. Details are given in [4, 38] and elsewhere. 

More could be said about making power spectrum estimation nearly 
optimal and also tractable, and much remains to be explored. 
3.10.6 Relating Bandpowers to Cosmic Parameters: To make use of 
the q b for cosmological parameter estimation, we need to know the entire 
likelihood function for q b , and not just assume a Gaussian form using the 
curvature matrix. This can only be done by full calculation, though there 
are two analytic approximations that have have been shown to fit the one- 
point distributions quite well in all the cases tried [57]. The simplest and 
most often used is an "offset lognormal" distribution, a Gaussian in z b = 
ln(q b + q^), where the offset q b N is related to the noise in the experiment: 

M = \^°/^] 1/2 -l, (26) 

where q b is the maximum likelihood value. To evaluate q b N in eq.(26), the 

curvature matrix evaluated in the absence of signal, J 7 ^) °, is needed. For 
Boomerang and Maxima analysis, the two Fisher matrices are standard 
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output of the "MADCAP" maximum likelihood power spectrum estimation 
code [59]. Another (better) approach if the likelihood functions £({q b }) are 
available for each q b is to fit ln£ by — ^[(z — z)b] 2 g 2 , and use gbGwdb' 

in place of J 7 ^) , where Gw = diagi K J-~ 1 ) 1 / 2 J-diag{J-~ 1 ) 1 / 2 . Here diagQ 
denotes the diagonal part of the matrix in question. Values of q b N and other 
data for a variety of experiments are given in [57] (where it is called x b ). It 
is also possible to estimate q b N using a quadratic estimator [4]. To compare 
a given theory with spectrum Cxe with the data using eq.(26), the q b, s need 
to be evaluated with a specific choice for ip^. Although this should be band- 
limited, oc Xbe, as described in § 3.10.4, the case can be made for a number 
of alternate forms. 

3.11 Forecasting: A typical forecast involves making large simplifications 
to the full statistical problem, such as those used in § 3.10.4. For example, 
the noise is homogeneous, possibly white, and the fluctuations are Gaussian. 
This was applied to estimates of how well cosmic parameters could be 
determined for the LDBs and MAP and Planck in the 9 parameter case 
described earlier [13], and for cases where the power spectrum parameters 
are allowed to open up beyond just tilts and amplitudes to parameterized 
shapes [62]. The methods were also applied when CMB polarization was 
included, and when LSS and supernovae were included [63]. 

The procedure is exactly like that for maximum likelihood parameter 
estimation, except one often does not bother with a realization of the power 
spectrum, rather uses the average (input) value and estimates errors from 
the ensemble average curvature, i.e., the Fisher matrix: F^ 1 is therefore a 
measure of (Sy a 5y^), where 5y a = y a — y a is the fluctuation of y a about 
its most probable value, y a . As mentioned above, when the y a are the 
bandpowers instead of the cosmic parameters, we get the forecasts of power 
spectra and their errors, as for MAP and Planck in Fig. 3. Those results 
ignored foregrounds. However other authors have treated foregrounds with 
the Gaussian and max-ent prior assumptions in this simplified noise case 
[45, 47]. 

Clearly forecasting can become more and more sophisticated as realistic 
noise and other signals are added, ultimately to grow into a full pipeline- 
testing simulation of the experiment, which all of the CMB teams strive to 
do to validate their operations. 

Challenge: The lesson of the CMB analysis done to date is that almost 
all of the time and debate is spent on the path to a believable primary 
power spectrum, with the cosmological parameters quickly dropping out 
once the spectrum has been agreed upon. The analysis challenge, requir- 
ing clever new algorithms, gets more difficult as we move to more LDBs 
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and interferometers, to MAP and to Planck. This is especially so with the 
ramping- up efforts on primary polarization signals, expected at <10% of 
the total anisotropies, and the growing necessity to fully confront secondary 
anisotropies and foregrounds simultaneously with the primary signals. 
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